Online Thinning for High Volume Streaming Data

نویسندگان

  • Xin J. Hunt
  • Rebecca Willett
چکیده

In an era of ubiquitous large-scale streaming data, the availability of data far exceeds the capacity of expert human analysts. In many settings, such data is either discarded or stored unprocessed in data centers. This paper proposes a method of online data thinning, in which large-scale streaming datasets are winnowed to preserve unique, anomalous, or salient elements for timely expert analysis. At the heart of this proposed approach is an online anomaly detection method based on dynamic, low-rank Gaussian mixture models. Specifically, the high-dimensional covariance matrices associated with the Gaussian components are associated with low-rank models. According to this model, most observations lie near a union of subspaces. The low-rank modeling mitigates the curse of dimensionality associated with anomaly detection for high-dimensional data, and recent advances in subspace clustering and subspace tracking allow the proposed method to adapt to dynamic environments. The resulting algorithms are scalable, efficient, and are capable of operating in real time. Experiments on wide-area motion imagery illustrate the efficacy of the proposed approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

Design and Test of the Real-time Text mining dashboard for Twitter

One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...

متن کامل

Examining students' online interaction in a live video streaming environment using data mining and text mining

0747-5632/$ see front matter 2012 Elsevier Ltd. A http://dx.doi.org/10.1016/j.chb.2012.07.020 ⇑ Tel.: +1 757 683 5008; fax: +1 757 683 5639. E-mail address: [email protected] This study analyses the online questions and chat messages automatically recorded by a live video streaming (LVS) system using data mining and text mining techniques. We apply data mining and text mining techniques to analyze tw...

متن کامل

Flow-Based Dissection of Network Services

The unprecedented success story of the Internet is largely due to rich and constantly emerging applications such as online social networks, video streaming, etc. To characterize the Internet and its usage, high-level metrics such as traffic volume or topology-related measures have been widely used in the past. However, researchers and network professionals still lack concepts to capture Interne...

متن کامل

Distributed Decision Tree Learning for Mining Big Data Streams

Web companies need to effectively analyse big data in order to enhance the experiences of their users. They need to have systems that are capable of handling big data in term of three dimensions: volume as data keeps growing, variety as the type of data is diverse, and velocity as the is continuously arriving very fast into the systems. However, most of the existing systems have addressed at mo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017